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Abstract 

In this paper, we model the cost incurred by each peer participating in a peer-to-peer network. Such 
a cost model allows to gauge potential disincentives for peers to collaborate, and provides a measure of 
the "total cost" of a network, which is a possible benchmark to distinguish between proposals. We char- 
acterize the cost imposed on a node as a function of the experienced load and the node connectivity, and 
show how our model applies to a few proposed routing geometries for distributed hash tables (DHTs). 
We further outline a number of open questions this research has raised. 



*This research is supported in part by the National Science Foundation, through grant ANI-0085879. 



1 Introduction 



A key factor in the efficiency of a peer-to-peer overlay network is the level of collaboration provided by each 
peer. This paper takes a first step towards quantifying the level of collaboration that can be expected from 
each participant, by proposing a model to evaluate the cost each peer incurs for being a part of the overlay. 

Such a cost model has several useful applications, among which, (1) providing a benchmark that can 
be used to compare between different proposals, complementary to recent works comparing topological 
properties of various overlays |7|E1> (2) allowing for predicting disincentives, and designing mechanisms 
that ensure a protocol is strategyproof fT6ll . and (3) facilitating the design of load balancing primitives. 

This work is not the first attempt to characterize the cost of participating in a network. Jackson and 
Wolinsky jOj proposed cost models to analyze formation strategies in social and economic networks. More 
recent studies ^ model (overlay) network formation as a non-cooperative game. These studies assume 
that each node has the freedom to choose which links it maintains, whereas we assume that the overlay 
topology is constrained by a protocol. Moreover, our approach extends previously proposed cost models 
EEIIqII, by considering the load imposed on each node in addition to the distance to other nodes and degree 
of connectivity. 

In the remainder of this paper, we introduce our proposed cost model, before applying it to several 
routing geometries used in recently proposed distributed hash tables (DHT) algorithms ll0llT2lfT8llT9ll2Tl . 
We conclude by discussing some open problems this research has uncovered. 

2 Proposed cost model 

The model we propose applies to any peer-to-peer network where nodes request and serve items, or serve 
requests between other nodes. This includes peer-to-peer file-sharing systems 1 1 1, ad-hoc networks |5 1, peer- 
to-peer lookup services fTFII^ . peer-to-peer streaming systems |8 |, or application-layer multicast overlays 
ll2l|3l[n, to name a few examples. 

To simplify the presentation, we assume a DHT-like structure, defined by quadruplet {V,E,K,F), 
where V is the set of vertices in the network, E is the set of edges, K is the set of keys (items) in the network, 
and F : — > y is the hash function that assigns keys to vertices. We denote hy Ki = {k ^ K : F{k) = i} 
the set of keys stored at node i ^V. We have K = (J^ Ki, and we assume, without loss of generality, that 
the sets Ki are disjoint.^ We characterize each request with two independent random variables, X and 
Y ^ K, which denote the node X making the request, and the key Y being requested, respectively. 

Consider a given node i ^ V. Every time a key k is requested in the entire network, node i is in one of 
four situations: 

1. Node i does not hold or request k, and is not on the routing path of the request. Node i is not subject 
to any cost. 

'if a key is stored on several nodes (replication), the replicas can be considered as different keys with the exact same probability 
of being requested. 



2 



2. Node i holds key k, and pays a price Si^k for serving the request. We define the service cost Si incurred 
by i, as the expected value of Si^k over all possible requests. That is, 

k€Ki 

3. Node i requests key k, and pays a price to look up and retrieve k. We model this price as ai^kt-i.j, 
where ^ is the number of hops between i and the node j that holds the key k, and aj.fc is a (positive) 
proportional factor. We define the access cost suffered by node i, Ai, as the sum of the individual 
costs tti^kUj multiplied by the probability key k € Kj is requested, that is, 

Ai = Y.Yl ('iMijHY = k] , (1) 
jeVkeKj 

with tij = oo if there is no path from node i to node j, and U^i = for any i. 

4. i does not hold or request k, but has to forward the request for k, thereby paying a price fc. The 
overall routing cost Ri experienced by node i is the average over all possible keys k, of the values of 
ri k such that i is on the path of the request. That is, we consider the binary function 



1 if i is on the path from j to I, 

excluding j and / 
otherwise. 



and express Ri as 



iev i£V keKi 



In addition, each node keeps some state information so that the protocol governing the DHT operates cor- 
rectly In most DHT algorithms, each node i maintains a neighborhood table, which grows linearly with the 
out-degree deg(i) of the node, resulting in a maintenance cost Mi given by 

Mi = rrii deg(i) , 

where > denotes the cost of keeping a single entry in the neighborhood table of node i. 
Last, the total cost Ci imposed on node i is given by 

Ci = Si + Ai + R^ + Mi, 

which can be used to compute the total cost of the network, C = X^iei/ ^i- The topology that minimizes 
C, or "social optimum," is generally not trivial. In particular, the social optimum is the full mesh only if 
rrii = for all i, and the empty set only if Oj ^ = for all {i, k). 
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3 Case studies 



We next apply the proposed cost model to a few selected routing geometries. We define a routing geometry 
as in Q, that is, as a collection of edges, or topology, associated with a route selection mechanism. Unless 
otherwise noted, we assume shortest path routing, and distinguish between different topologies. We derive 
the various costs experienced by a node in each geometry, before illustrating the results with numerical 
examples. 

3.1 Analysis 

We consider a network of > nodes, and, for simplicity, assume that, for all i and k, Si^k = s, ai^k = «> 
rj = r, and mi = m. For the analysis in this section, we also assume that each node holds the same 
number of keys, and that all keys have the same popularity. As a result, for all i, 

Epr[i^ = ^] = ^' 

keKi 

which implies 

S-- 

regardless of the geometry considered. We also assume that requests are uniformly distributed over the set 
of nodes, that is, for any node i. 

Last, we assume that no node is acting maliciously. 

Star network The star frequently appears as an equilibrium in network formation studies using cost mod- 
els based on graph connectivity |3|6||9|. 

We use i = to denote the center of the star, which routes all traffic between peripheral nodes. That is, 
Xj,i{0) = 1 for any j I (j > 0, I > 0). Substituting in Eqn. (EJi, we get 

r(jV-l)(iV-2) 

^0- ^2 ■ 

The center node is located at a distance of one hop from all {N — 1) other nodes, thus 

" N 

In addition, deg(O) = — 1, which implies that the cost incurred by the center of the star, Co, is 

Peripheral nodes do not route any traffic, i.e., Ri = for all i > 0, and are located at a distance of one 
from the center of the star, and at a distance of two from the (A^ — 2) other nodes, giving 

_ a{2N - 3) 
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Furthermore, deg{i) = 1 for all peripheral nodes. Thus, Mj = m, and the total cost imposed on nodes i > 
is 

Ci = m + ^— . (4) 

The difference Cq — Ci quantifies the (dis)incentive to be in the center of the star. As expressed in the 
following two theorems, there is a (dis)incentive to be in the center of the stai" in a vast majority of cases. 

Theorem 1. If the number of nodes N (N > 0) is variable, Cq ^ Cj unless m = r = a = 0. 

Proof Assume that Cq - Q = 0. Because / 0, Cq - Cj = is equivalent to N'^^Cq - Q) = 0. Using 
the expressions for Cq and Cj given in Eqs. ^ and Q, and rewriting the condition N'^{Cq — Ci) = as a 
polynomial in N, we obtain 

miV^ - (2m + a - r)N'^ + (2a - 3r)iV + 2r = . 

We can factor the above by {N — 2), and obtain 

{N - 2){mN'^ - {a - r)N - r) = Q . (5) 

A polynomial in N is constantly equal to zero if and only if all of the polynomial coefficients are equal to 
zero. Thus, Eqn. Q holds for any value of N if and only if: 

m = , 
a — r = , 
^ r = . 

The solutions of the above system of equations are m = r = a = 0. Hence, Cq — Cj = for any N only 
when nodes only pay an (arbitrary) price for serving data, while state maintenance, traffic forwarding, and 
key lookup and retrieval come for free. □ 

Theorem 2. If the number of nodes N (N > 0) is held fixed, and at least one ofm, r, or a is different from 
zero, Cq = Ci only if N = 2 or N = No, where Nq is a positive integer that must satisfy: 

{—^ if m = and r ^ a , 

Additionally, Cq 7^ Cifor any N^2ifm = and r = a. 

Proof. Recall from the proof of Theorem ^ that Co — Cj = is equivalent to Eqn. Clearly, setting 
N = 2 satisfies Eqn. ^ for all values of s, r, and m. Assuming now that 7^ 2, to have Cq — Ci = 0, we 
need to have 

miV^ - (a - r)N - r = . (7) 

Since at least one of m, r, or a is not equal to zero, Eqn. (0 has at most two real solutions. We distinguish 
between all possible cases for m, r, and a such that at least one of m, r, and a is different from zero. 
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If m = 0, and a = r, Eqn. Q reduces to r = 0, which imphes m = r = a = 0, thereby contradicting 
the hypothesis that at least one of m, r, and a is different from zero. Therefore, Eqn. © does not 
admit any solution, i.e., there is a (dis)incentive to be in the center of the star regardless of N. 

If m = and r ^ a, the only solution to Eqn. Q is 

iVo = . (8) 

r — a 

Note that if r < a, A'^o < which is not feasible. (The number of nodes has to be positive.) 
If m / 0, then Eqn. (Q admits two real roots (or a double root if a = r = 0), given by 

a — r , 

No = ± 

2m 

However, because r > 0, and m > 0, 




,2 

a — r a — r \ r 



2m y \ 2m J m 
so that the only potentially feasible Nq is given by 



+ — <0 




(9) 

Combining Eqs. ^ and Eqs. ^ yields the expression for A^o given in Eqn. Q. Note that the expression 
given in Eqn. Q is only a necessary condition. In addition, A'^o has to be an integer so that we can set the 
number of nodes N to N = Nq. □ 

De Bruijn graphs De Bruijn graphs are used in algorithms such as Koorde flOl . Distance-Halving fT^, 
or ODRI 1 12|, and are extensively discussed in [ 1211201. In a de Bruijn graph, any node i is represented by 
an identifier string (zi, . . . , i/)) of Z) symbols taken from an alphabet of size A. The node represented by 
(ii, . . . , i^)) hnks to each node represented by {i2, ■ ■ ■ ,iD,x) for all possible values of x in the alphabet. 
The resulting directed graph has a fixed out-degree A, and a diameter D. 

Denote by V' the set of nodes such that the identifier of each node in V' is of the form {h,h, . . . , h). 
Nodes in V' link to themselves, so that Mi = m(A — 1) for i E V' . For nodes i ^ V' , the maintenance 
cost Mi is Mi = mA. The next two lemmas will allow us to show that the routing cost at each node also 
depends on the position of the node in the graph. 

Lemma 1. With shortest-path routing, nodes i do not route any traffic, and Ri = 0. 

Proof. (By contradiction.) Consider a node i £ V' with identifier {h,h, . . . , h), and suppose i routes traffic 
from a node j to a node k. The nodes linking to i are all the nodes with an identifier of the form {x,h, . . . , h), 
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for all values of x in the alphabet. The nodes linked from i are all the nodes of the form (/i, . . . , /i, y) for all 
values of y in the alphabet. Therefore, there exists xq and yo such that traffic from node j to node k follows 
a path V = (xq, h, . . . ,h) —>■ {h, h, . . . , h) {h,h, . . . , yo). Because, in a de Bruijn graph, there is an 
edge between (xq, h, . . . ,h) and {h, h, . . . ,yo), traffic using the path V between j and k does not follows 
the shortest path. We arrive to a contradiction, which proves that i does not route any traffic. □ 

Lemma 2. The number of routes Lj passing through a given node i is bounded by Lj < -Lmax with 

_ (i? - l)(A^+2 - (A - 1)2) - DA^+i + A2 

imax - (A -1)2 ■ 

The bound is tight, since it can be reached when A > D for the node (0, 1, 2, . . . , D — 1). 

Proof. The proof follows the spirit of the proof used in l20l to bound the maximum number of routes passing 
through a given edge. In a de Bruijn graph, by construction, each node maps to an identifier string of length 
D, and each path of length k hops maps to a string of length D + k, where each substring of D consecutive 
symbols corresponds to a different hop [12|. Thus, determining an upper bound on the number of paths of 
length k that pass through a given node i is equivalent to computing the maximum number, 1^, of strings 
of length D + k that include node z's identifier, cTj = (ii, . . . ,10), as a substring. In each string of length 
D + k corresponding to a paths including i, where i is neither the source nor the destination of the path, the 
substring fjj can start at one of {k — 1) positions (2, . . . , k). There are A possible choices for each of the k 
symbols in the string of length D + k that are not part of the substring fjj. As a result, 

h < (A:-1)A'= . 

With shortest path routing, the set of all paths going through node i include all paths of length D + k with 

k G [1,D]. So, 

k=D k=D 



Li < 5^Zfc< ^(A:-1)A^ 



k=l k=l 

(D - l)A^+2 - i?A^+i + A2 

We improve the bound given in Eqn. (flOt by considering the strings of length 2D that are of the form a* a*, 
where a* is a string of length D. Strings of the form a* a* denote a cycle a* — > a*, and cannot be a 
shortest path in a de Bruijn graph. Hence, we can subtract the number of the strings a*(j* from the bound in 
Eqn. (fTUl . Because £7, = (zi, . . . , i^)) is a substring of a* a* of length D, a* has to be a circular permutation 
of (Tj, for instance ii, . . . ,10-2)- Since i does not route any traffic when i is the source of traffic, 

a* 7^ (Tj. Thus, there are only {D — 1) possibilities for a* , and {D — 1) strings (j*a* . Subtracting {D — 1) 
from the bound in Eqn. (flOl yields Lmax- 

□ 
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From Lemmas ^ and |2j we infer that, in a de Bruijn graph, for any i, j and k, < Pr[xi,j{k) = 
1] < Ljaax/N"^- Because Xi,j{^) is a binary function, Pr[xij(A;) = 1] = E[xi^j\, and we finally obtain 

< i?i < i?max with 

y-, ''-^max 

-Kmax — ' 

We next compute upper and lower bounds on the access cost. To derive a tight upper bound on Ai, consider 
a node i € V. Node i links to itself and has only (A — 1) neighbors. Each neighbor of i has itself A 
neighbors, so that there are A(A — 1) nodes k such that = 2. By iteration and substitution in Eqn. Q, 
we get, after simplification, Ai < Amax> with 

DA^+^ -{D + 1)A^ + 1 
- " Ar(A-l) ' 

and Ai = A^^^^ for nodes in V'. 

Now, consider that each node i has at most A neighbors. Then, node i has at most A^ nodes at distance 
2, at most A'^ nodes at distance 3, and so forth. Hence, there are at least A^ — J2k=o nodes at the 
maximum distance of D from node i. We get 



N 

which reduces to Ai > ^min> with 



k=l \ k=0 



k 



A -"l^rA^i ^ A(A^-1) \ 

Amm-^^i^A +^_^ (A-l)2 



It can be shown that Ai = A^i^ for the node (0, 1, . . . , D — 1) when A> D. 

Note that, the expressions for both ^min and A^ax can be further simplified for = A^, that is, when 
the identifier space is fully populated. 

D-dimensional tori We next consider D-dimensional tori, as in CAN lITSl . where each node is represented 
by D Cartesian coordinates, and has 2D neighbors, for a maintenance cost of Mj = 2mD for any i. 

Routing at each node is implemented by greedy forwarding to the neighbor with the shortest Euclidean 
distance to the destination. We assume here that each node is in charge of an equal portion of the D- 
dimensional space. From (TEi . we know that the average length of a routing path is ^N^^^ hops.^ Because 
we assume that the D-dimensional torus is equally partitioned, we conclude by symmetry, that for all i, 

A, = a-N^^ . 
4 

To determine the routing cost Ri, we compute the number of routes passing through a given node i, 
or node loading, as a function Li D of the dimension D. With our assumption that the D-torus is equally 
partitioned, Li^D is the same for all i by symmetry. We next compute Lj z) by induction on the dimension 
D. 



^Loguinov et al. 1121 refined that result by distinguishing between odd and even values of A^. 
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Figure 1 : Routing in a ring. The numbers in parentheses represent the number of routes originating from 
the black node that pass through each node. 



Base case {D = 1). For D = I, the D-torus is a ring, as depicted in Figure [2 for N = 7. Each of 
the diagrams in the figure corresponds to a case where the source of all requests, represented by a black 
node, is held fixed. The numbers in each node (0, . . . , 6) represent the node coordinate, the different line 
styles represent the different routes to all destinations, and the numbers in parentheses denote the number of 
routes originating from the fixed source that pass through each of the other nodes. As shown in the figure, 
shifting the source of all requests from to 1, . . . , 6 only results in shifting the number of routes that pass 
through each node. Hence, the node loading Lj i at each node i, is equal to the sum of the number of routes 
passing through each node when the source is held fixed. In the figure, for = 7, we have for any i, 
Li^i = + 1 + 2 + 2 + 1 + = 6. More generally, for N odd, the sum of the number of routes passing 
through each node is equal to 



L 



2 1^1 + 2 + . . . + 

{N -1){N -3) 



N -I 



1 



(11) 



and for N even, is given by 



1 + 2 + ...+ 

{N - 2)2 



N 

1 

2 



+ 1 + 2 + ...+ 



N ^ 



(12) 



We can express Eqs. (fTTT) and (fT2l) in a more compact form, which holds for any N, 



L 



N 



N' 

y 



(13) 



General case {D > 1). The key observation to compute the number of routes Lj passing through 
each node i for D > 1, is that there are several equivalent shortest paths along the Cartesian coordinates. 
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Figure 2: Routing in 3-torus. Coordinates are corrected one at a time, first along the horizontal axis, then 
along the vertical axis, and finally along the diagonal axis. 

because the coordinates of two consecutive nodes in a path cannot differ in more than one dimension. 
Consider for instance, for D = 2, going from node (0,0) to node (1,1): both Vi = (0, 0) (1, 0) — > (1, 1) 
and V2 = (0,0) — > (0, 1) — > (1, 1) are equivalent shortest paths. Therefore, we can always pick the path 
that corrects coordinates successively, starting with the first coordinate, i.e., Vi in the above example. 

Consider a D-torus, as represented for D = 3 in Figure El where each of the = 125 nodes is 
represented by a dot. The figure illustrate how requests are routed by correcting coordinates successively, 
with the example of three different paths, (2,3,2) (3,1,2), (2,2,2) (0,4,0), and (2,1,3) (4,0,4). 
For any node k, we compute the number of routes passing through k. We denote the source of the route as 
node i, and the destination of the route as node j. We have i ^ j ^ k. We further denote the coordinates of i, 
j, and A; by [ii, . . . ,10), (ji, • • • ,3d), and {ki, . . . , k^)- We distinguish between the only three possibilities 
for k that are allowed by the routing scheme that corrects coordinates one at a time: 

1. Node A; has the same D-th coordinate as both the source i and the destination j, i.e., = jd = ^D- In 
other words, the route = z — > j is entirely contained within a.{D — l)-torus. This case is illustrated 
in the figure for the route represented by a solid line going from i = (2, 3, 2) to j = (3, 1, 2) through 
k = (3, 2, 2). The corresponding {D — l)-torus containing i, j and k is denoted by the shaded box. 
By definition of the node loading, the node loading resulting from all possible paths V contained in 
a {D — l)-torus is equal to Li^o-i- There are n different such {D — l)-tori in the Z)-torus under 
consideration, one for each possible value of io = jo = ko- So, the total load incurred on each node 
by all paths which remain contained within a. {D — l)-torus is equal to nLiD^i. 

2. Nodes i, j and k all differ in their D-th coordinate, i.e., id ^ Jd 7^ ^D- Because coordinates are 
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corrected one at a time, for any Zg(1,...,Z) — 1), we must have ki = ji. This case is illustrated in 
the figure for the route represented by a dashed line, going from i = (2, 2, 2) to j = (0, 4, 0), through 
k = (0, 4, 1). Since ki = ji, nodes j and k belong to the same ring where only the D-th coordinate 
varies. Such a ring is represented in the figure by the dotted curve. From node /c's perspective, routing 
traffic from i to j is equivalent to routing traffic between nodes i' and j, where node i' satisfies = ki 
for any I G (1, . . . , D — 1), and i'j^ = io- (In the figure, the coordinates of node i' are (0, 4, 2).) 
From our hypothesis ko / ^d. we have kr) ^ i'^, which implies k ^ i' . Therefore, computing the 
number of routes passing through node k coming from i is equivalent to computing the number of 
routes passing through node k and originating from node i' . Summing over all possible destination 
nodes j, the computation of all routes passing through k and originating from all nodes i' in the same 
ring as j and k is identical to the computation of the node loading in the base case D = 1. So, the load 
imposed on node k is equal to Lj i. Now, summing over all possible nodes i is equivalent to summing 
over all possible rings where only the D-th coordinate varies. There are are n^^^ such rings in the 
D-torus. We conclude that the total load incurred on each node k by the paths going from all i to all 
j satisfying id 3D ^ ko is equal to n^^^Li^i. 

3. Node k has the same D-th coordinate as node i, and a D-th coordinate different from that of the 
destination j. In other words, in Jz)> = ^D- This situation is illustrated in the figure for the 
route going from i = (2, 1, 3) to j = (4, 0, 4) and passing through k = (4, 0, 3), and represented by a 
thick dotted line. In this configuration, there are (n — 1) possible choices for the destination node j 
such that jo 7^ k^, and ji = ki ior I < D — 1. There are n^~^ — 1 possible choices for the source 
node i such that id = ko and i ^ j. Hence, in this configuration, there is a total of {n — l){n^~^ — 1) 
routes passing through each node k. 

Summing the node loadings obtained in all three possible cases above, we obtain 



Li^D = nLi^D-i + ^Li,i + (n - l)(n 



I) . 



Replacing Lj i by the expression given in Eqn. (fT3l . using n = N^^^, and removing the recursion in the 
above relationship, we obtain, for any node i. 



LiD = N— \D\Nd - I 



Nd \ +1 



For all i, Ri immediately follows from Li^o with 



Ri 



Li 



D 



iV2 



Plaxton trees We next consider the variant of Plaxton trees fTTll used in Pastry fTOll or Tapestry f22l. 
Nodes are represented by a string (ii, . . . , i/?) of D digits in base A. Each node is connected to D{IS. — 1) 
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distinct neiglibors of the form (ii, . . . , ii^i,x, yi+i, ■ ■ ■ , yn), fori = 1 ... D, and x 7^ G {0, . . . , A — 1}.^ 
The resulting maintenance cost is Mj = mD(A — 1). 

Among the different possibilities for the remaining coordinates . . . ,y£), the protocols generally 
select a node that is nearby according to a spatial proximity metric. We here assume that the spatial distribu- 
tion of the nodes is uniform, and that the identifier space is fully populated (i.e., = A^), which enables 
us to pick yi^i = ii+i, . . . ,yD = in- Thus, two nodes i and j at a distance of k hops differ in k digits. 
There are (^) ways of choosing which digits are different, and each such digit can take any of (A — 1) 
values. So, for a given node i, there are (^) (A — 1)^ nodes that are at distance k from i. Multiplying by 
the total number of nodes = A^, and dividing by the total number of paths N"^, we infer that, for all i, j, 
and k, we have 

(^) (A - 1)'' 

P^hj = k]= ' ■ (14) 

Now, for any i and j such that tij = I, because routes are unique, there are exactly (/ — 1) different nodes 
on the path between i and j. So, the probability that a node k picked at random is on the path from i to j is 

Pr[x„-(/c) = l|t,,, = /] = ^. (15) 
The total probability theorem tells us that 

D 

Pr[xi,j{k) = 1] = ^Prfcj(fc) = 1\U,, = I] ■ Fr[U,j = I] . 
1=1 

Substituting with the expressions obtained for Pr[tj j = /] and Fr[xij{k) = l\tij = I] in Eqs. (flU l and ST5i 
gives: 

PAx^Ak) = 1] = 4 - 1) ) - ' (16) 

1=1 ^ ^ 

which can be simplified as follows. We write: 

f; H il - 1)(A - 1)' = (A - 1)^ f^il - 1) (^) (A - 1)'-^ , 
1=1 ^ ^ 1=1 ^ ^ 

and rewrite the right-hand term as a function of the derivative of a series, 

g(°)a-i)(A-i,-(A-i)^Ag(^)(^ 

or, equivalently. 



E (^) il - 1)(A - 1)^ ^ (A - 1)^^ E (^) 1-(A - 



'For A = 2, this geometry reduces to a hypercube. 
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The binomial theorem allows us to simplify the above to: 

g (?) (' - i)<^ - = - I'^ll ( A^(i + A - - l) , 

which, making the partial derivative explicit, becomes, 
and reduces to 

(^) (/ - 1)(A - 1)' = A^-\D{A - 1) - A) + 1 . 
1=1 ^ ^ 
Substituting in Eqn. (IT6l gives: 



which we multiply by r to obtain 



^^^^ A°-.(D(A^^l)-A) + l ^ (17) 



To compute the access cost Aj, we use the relationship Ai = aE[tij]. We have 

D 

k=l 

which, using the expression for Pr[tjj = A;] given in Eqn. (fT4l . implies 



N 

k=0 

and can be expressed in terms of the derivative of a classical series: 

A-1 d 



E[ti 



AT 5A 

Using the binomial theorem, the series on the right-hand side collapses to A^, which yields 

A - 1 a(A^) 



N dA ■ 

We compute the partial derivative, and obtain 

L»A^-i(A- 1) 



)btai 



N 

Multiplying by a to obtain Ai, we eventually get, for all i, 

DA^-\A-1) 



Ai = a- 



N 

which can be simplified, using N = A^ : 



A^ = aD^^ . (18) 
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-Rmax 


^max 

R' 
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(2,9) 


7.18 


8.00 


1.11 


3.89 


17.53 


4.51 


(3,6) 


5.26 


5.50 


1.04 


2.05 


9.05 


4.41 


(4,4) 


3.56 


3.67 


1.03 


5.11 


13.87 


2.71 


(5,4) 


3.69 


3.75 


1.02 


1.98 


5.50 


2.78 


(6, 3) 


2.76 


2.80 


1.01 


5.38 


9.99 


1.86 



Table 1: Asymmetry in costs in a de Bruijn graph (a = 1, r = 1000) 
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Figure 3: Costs in a de Bruijn network with A = 5, = 4 and N = 625 



Chord rings In a Chord ring lETl . nodes are represented using a binary string (i.e., A = 2). When the ring 
is fully populated, each node i is connected to a set of D neighbors, with identifiers {{i + 2™) mod 2^) 
for m = . . . L> — 1. An analysis identical to the above yields Ri and Ai as in Eqs. Mil and (ITSl for A = 2. 
Note that Eqn. (fTSl with A = 2 is confirmed by experimental measurements 1211 . 

3.2 Numerical results 

We illustrate our analysis with a few numerical results. In Table we consider five de Bruijn graphs with 
different values for A and D, and X and Y i.i.d. uniform random variables. Tabled shows that while the 
access costs of all nodes are comparable, the ratio between i?max and the second best case routing cost,^ 
^min' general significant. Thus, if r ^ a, there can be an incentive for the nodes with Ri = i?max to 
defect. For instance, these nodes may leave the network and immediately come back, hoping to be assigned 
a different identifier i' ^ i and incurring a lower cost. Additional mechanisms, such as enforcing a cost of 
entry to the network, may be required to prevent such defections. We graph the access and routing costs 
for the case A = 5, = 4 and N = 625 in Figure |3] We plot the access cost of each node in function 
of the node identifier in Figure I^Ja), and the routing cost of each node in function of the node identifier 
"'That is, the minimum value for Ri over all nodes but the A nodes in V' for which Ri = 0. 
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Figure 4: Access and routing costs. Curves marked "sim" present simulation results. 



in Figure |3lb). Figure |3] further illustrates the asymmetry in costs evidenced in Tabled by exhibiting that 
different nodes have generally different access and routing costs. Therefore, in a de Bruijn graph, there 
is potentially a large number of nodes that can defect, which, in turn, may result in network instability, if 
defection is characterized by leaving and immediately rejoining the network. 

Next, we provide an illustration by simulation of the costs in the different geometries. We choose 
A = 2, for which the results for Plaxton trees and Chord rings are identical. We choose D = {2, 6} for the 
-dimensional tori, and D = log^ N for the other geometries. We point out that selecting a value for D 
and A common to all geometries may inadvertently bias one geometry against another. We emphasize that 
we only illustrate a specific example here, without making any general comparison between different DHT 
geometries. 

We vary the number of nodes between = 10 and N = 1000, and, for each value of N run ten 
differently seeded simulations, consisting of 100,000 requests each, with X and Y i.i.d. uniform random 
variables. We plot the access and routing costs averaged over all nodes and all requests in Figure |3] The 
graphs show that our analysis is validated by simulation, and that the star provides a lower average cost than 
all the other geometries. In other words, a centralized architecture appears more desirable to the community 
as a whole than a distributed solution. However, we stress that we do not consider robustness against attack, 
fault-tolerance, or potential performance bottlenecks, all being factors that pose practical challenges in a 
centralized approach, nor do we offer a mechanism creating an incentive to be in the center of the star. 
Wliile the cost model proposed here can be used to quantify the cost incurred by adding hnks for a higher 
resiUency to failures, we defer that study to future work. 

4 Discussion 

We proposed a model, based on experienced load and node connectivity, for the cost incurred by each peer 
to participate in a peer-to-peer network. We argue such a cost model is a useful complement to topological 
performance metrics E1E1> in that it allows to predict disincentives to collaborate (peers refusing to serve 
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requests to reduce their cost), discover possible network instabilities (peers leaving and re-joining in hopes 
of lowering their cost), identify hot spots (peers with high routing load), and characterize the efficiency of a 
network as a whole. 

We believe however that this paper raises more questions than it provides answers. First, we only ana- 
lyzed a handful of DHT routing geometries, and even omitted interesting geometries such as the butterfly 
IlSl, or geometries based on the XOR metric flA^. Applying the proposed cost model to deployed peer-to- 
peer systems such as Gnutella or FastTrack could yield some insight regarding user behavior. Furthermore, 
for the mathematical analysis, we used strong assumptions such as identical popularity of all items and 
uniform spatial distribution of all participants. Relaxing these assumptions is necessary to evaluate the per- 
formance of a geometry in a realistic setting. Also, obtaining a meaningful set of values for the parameters 
(s, a, r, m) for a given class of applications (e.g., file sharing between PCs, ad-hoc routing between energy- 
constrained sensor motes) remains an open problem. Finally, identifying the minimal amount of knowledge 
each node should possess to devise a rational strategy, or studying network formation with the proposed cost 
model are other promising avenues for further research. 
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